Main
Shiya Liu
I have made visualizations viewed by hundreds of thousands of people, sped up query times for 25 terabytes of data by an average of 4,800 times, and built packages for R that let you do magic.
Education
PhD. Candidate, Biostatistics
Vanderbilt University
Nashville, TN
2020 - 2015
- Focused on network models & interactive visualization platforms for electronic health records data
- University Graduate Fellow
B.S., Mathematics, Statistics (minor C.S.)
University of Vermont
Burlington, VT
2015 - 2011
- Thesis: An agent based model of Diel Vertical Migration patterns of Mysis diluviana
Research Experience
Graduate Research Assistant
TBILab (Yaomin Xu’s Lab)
Vanderbilt University
Current - 2015
- Primarily working with large EHR and Biobank datasets.
- Developing network-based methods to investigate and visualize clinically relevant patterns in data.
Data Science Researcher
Data Science Lab
Johns Hopkins University
2018 - 2017
- Building R Shiny applications in the contexts of wearables and statistics education.
- Work primarily done in R Shiny and Javascript (node and d3js).
Undergraduate Researcher
Rubenstein Ecosystems Science Laboratory
University of Vermont
2015 - 2013
- Analyzed and visualized data for CATOS fish tracking project.
- Head of data mining project to establish temporal trends in population densities of Mysis diluviana (Mysis).
- Ran project to mathematically model the migration patterns of Mysis (honors thesis project.)
Human Computer Interaction Researcher
LabInTheWild (Reineke Lab)
University of Michigan
2015 - 2015
- Led development and implementation of interactive data visualizations to help users compare themselves to other demographics.
Undergraduate Researcher
Bentil Laboratory
University of Vermont
2014 - 2013
- Developed mathematical model to predict the transport of sulfur through the environment with applications in waste cleanup.
Research Assistant
Adair Laboratory
University of Vermont
2013 - 2012
- Independently analyzed and constructed statistical models for large data sets pertaining to carbon decomposition rates.
Industry Experience
I have worked in a variety of roles ranging from journalist to software engineer to data scientist. I like collaborative environments where I can learn from my peers.
Software Engineer
RStudio
Remote
Current - 2020
- Helping make programming web applications with R easier and more beautiful on the Shiny team
Data Journalist - Graphics Department
New York Times
New York, New York
2016 - 2016
- Reporter with the graphics desk covering topics in science, politics, and sport.
- Work primarily done in R, Javascript, and Adobe Illustrator.
Engineering Intern - User Experience
Dealer.com
Burlington, VT
2015 - 2015
- Built internal tool to help analyze and visualize user interaction with back-end products.
Data Science Intern
Dealer.com
Burlington, VT
2015 - 2015
- Worked with the product analytics team to help parse and visualize large stores of data to drive business decisions.
Data Artist In Residence
Conduce
Carpinteria, CA
2015 - 2014
- Envisioned, prototyped and implemented visualization framework in the course of one month.
- Constructed training protocol for bringing third parties up to speed with new protocol.
Software Engineering Intern
Conduce
Carpinteria, CA
2014 - 2014
- Incorporated d3.js to the company’s main software platform.
Teaching Experience
I am passionate about education. I believe that no topic is too complex if the teacher is empathetic and willing to think about new methods of approaching task.
Javascript for Shiny Users
RStudio::conf 2020
N/A
2020
- Served as TA for two day workshop on how to leverage Javascript in Shiny applications
- Lectured on using R2D3 package to build interactive visualizations.
Data Visualization Best Practices
DataCamp
N/A
2019 - 2019
- Designed from bottom up course to teach best practices for scientific visualizations.
- Uses R and ggplot2.
- In top 10% on platform by popularity.
Improving your visualization in Python
DataCamp
N/A
2019 - 2019
- Designed from bottom up course to teach advanced methods for enhancing visualization.
- Uses python, matplotlib, and seaborn.
Advanced Statistical Learning and Inference
Vanderbilt Biostatistics Department
Nashville, TN
2018 - 2017
- TA and lectured
- Topics covered from penalized regression to boosted trees and neural networks
- Highest level course offered in department
Advanced Statistical Computing
Vanderbilt Biostatistics Department
Nashville, TN
2018 - 2018
- TA and lectured
- Covered modern statistical computing algorithms
- 4th year PhD level class
Statistical Computing in R
Vanderbilt Biostatistics Department
Nashville, TN
2017 - 2017
- TA and lectured
- Covered introduction to R language for statistics applications
- Graduate level class
Selected Data Science Writing
I regularly blog about data science and visualization on my blog LiveFreeOrDichotomize.
Using AWK and R to Parse 25tb
LiveFreeOrDichotomize.com
N/A
2019
- Story of parsing large amounts of genomics data.
- Provided advice for dealing with data much larger than disk.
- Reached top of HackerNews.
Classifying physical activity from smartphone data
RStudio Tensorflow Blog
N/A
2018
- Walk through of training a convolutional neural network to achieve state of the art recognition of activities from accelerometer data.
- Contracted article.
The United States of Seasons
LiveFreeOrDichotomize.com
N/A
2018
- GIS analysis of weather data to find the most ‘seasonal’ locations in United States
- Used Bayesian regression methods for smoothing sparse geospatial data.
A year as told by fitbit
LiveFreeOrDichotomize.com
N/A
2017
- Analyzing a full years worth of second-level heart rate data from wearable device.
- Demonstrated visualization-based inference for large data.
MCMC and the case of the spilled seeds
LiveFreeOrDichotomize.com
N/A
2017
- Full Bayesian MCMC sampler running in your browser.
- Coded from scratch in vanilla Javascript.
The Traveling Metallurgist
LiveFreeOrDichotomize.com
N/A
2017
- Pure javascript implementation of traveling salesman solution using simulated annealing.
- Allows reader to customize the number and location of cities to attempt to trick the algorithm.
Selected Press (About)
Great paper? Swipe right on the new <U+2018>Tinder for preprints<U+2019> app
Science
N/A
2017 - 2017
- Story of the app Papr made with Jeff Leek and Lucy D<U+2019>Agostino McGowan.
Swipe right for science: Papr app is <U+2018>Tinder for preprints<U+2019>
Nature News
N/A
2017 - 2017
- Second press article for app Papr.
The Deeper Story in the Data
University of Vermont Quarterly
N/A
2016 - 2016
- Story on my path post graduation and the power of narrative.
Selected Press (By)
The Great Student Migration
The New York Times
N/A
2016 - 2016
- Most shared and discussed article from the New York Times for August 2016.
Wildfires are Getting Worse, The New York Times
The New York Times
N/A
2016 - 2016
- GIS analysis and modeling of fire patterns and trends
- Data in collaboration with NASA and USGS
Who<U+2019>s Speaking at the Democratic National Convention?
The New York Times
N/A
2016 - 2016
- Data scraped from CSPAN records to figure out who talked and past conventions.
Who<U+2019>s Speaking at the Republican National Convention?
The New York Times
N/A
2016 - 2016
- Used same data scraping techniques as Who<U+2019>s Speaking at the Democratic National Convention?
A Trail of Terror in Nice, Block by Block
The New York Times
N/A
2016 - 2016
- Led research effort to put together story of 2016 terrorist attack in Nice, France in less than 12 hours.
- Work won Silver medal at Malofiej 2017, and gold at Society of News and Design.
Selected Publications, Posters, and Talks
Building a software package in tandem with machine learning methods research can result in both more rigorous code and more rigorous research
ENAR 2020
N/A
2020
- Invited talk in Human Data Interaction section.
- How and why building an R package can benefit methodological research
Stochastic Block Modeling in R, Statistically rigorous clustering with rigorous code
RStudio::conf 2020
N/A
2020
- Invited talk about new sbmR package.
- Focus on how software development and methodological research can improve both benefit when done in tandem.
PheWAS-ME: A web-app for interactive exploration of multimorbidity patterns in PheWAS
Bioinformatics
N/A
2020
- Manuscript detailing application for the exploration of multimorbidity patterns in PheWAS analyses
- See landing page for more information.
Charge Reductions Associated with Shortening Time to Recovery in Septic Shock
Chest
N/A
2019 - 2019
- Authored with Wesley H. Self, MD MPH; Dandan Liu, PhD; Stephan Russ, MD, MPH; Michael J. Ward, MD, PhD, MBA; Nathan I. Shapiro, MD, MPH; Todd W. Rice, MD, MSc; Matthew W. Semler, MD, MSc.
Multimorbidity Explorer | A shiny app for exploring EHR and biobank data
RStudio::conf 2019
N/A
2019 - 2019
- Contributed Poster. Authored with Yaomin Xu.
Taking a network view of EHR and Biobank data to find explainable multivariate patterns
Vanderbilt Biostatistics Seminar Series
N/A
2019 - 2019
- University wide seminar series.
Patient-specific risk factors independently influence survival in Myelodysplastic Syndromes in an unbiased review of EHR records
Under-Review (copy available upon request.)
N/A
2019
- Bayesian network analysis used to find novel subgroups of patients with Myelodysplastic Syndromes (MDS).
- Analysis done using method built for my dissertation.
Patient specific comorbidities impact overall survival in myelofibrosis
Under-Review (copy available upon request.)
N/A
2019
- Bayesian network analysis used to find robust novel subgroups of patients with given genetic mutations.
- Analysis done using method built for my dissertation.
R timelineViz: Visualizing the distribution of study events in longitudinal studies
Under-Review (copy available upon request.)
N/A
2018 - 2018
- Authored with Alex Sunderman of the Vanderbilt Department of Epidemiology.
Continuous Classification using Deep Neural Networks
Vanderbilt Biostatistics Qualification Exam
N/A
2017 - 2017
- Review of methods for classifying continuous data streams using neural networks
- Successfully met qualifying examination standards
Asymmetric Linkage Disequilibrium: Tools for Dissecting Multiallelic LD
Journal of Human Immunology
N/A
2015 - 2015
- Authored with Richard Single, Vanja Paunic, Mark Albrecht, and Martin Maiers.
An Agent Based Model of Mysis Migration
International Association of Great Lakes Research Conference
N/A
2015 - 2015
- Authored with Brian O’Malley, Sture Hansson, and Jason Stockwell.
Declines of Mysis diluviana in the Great Lakes
Journal of Great Lakes Research
N/A
2015 - 2015
- Authored with Peter Euclide and Jason Stockwell.